# Lightweight multimodal
Smolvlm Instruct GGUF
Apache-2.0
SmolVLM is a compact open-source multimodal model that can accept image and text inputs and generate text outputs. It is designed for high efficiency and is suitable for device-side applications.
Image-to-Text
Transformers English

S
Mungert
1,023
2
Smolvlm2 2.2B Instruct
Apache-2.0
SmolVLM2-2.2B is a lightweight multimodal model designed for analyzing video content. It can process video, image, and text inputs and generate text outputs.
Image-to-Text
Transformers English

S
HuggingFaceTB
62.56k
164
Uform Gen2 Qwen 500m
Apache-2.0
UForm-Gen is a small generative vision-language model primarily used for image caption generation and visual question answering.
Image-to-Text
Transformers English

U
unum-cloud
17.98k
76
Featured Recommended AI Models